Soujanya
Vadapalli, International Institute of Information Technology, Hyderabad
[PRIMARY contact]
      Shraddha Agrawal, International Institute
of Information Technology, Hyderabad 
      Sravanthi Kollukuduru, International
Institute of Information Technology, Hyderabad 
      Kamalakar Karlapalem, International
Institute of Information Technology, Hyderabad [Faculty advisor]
We built the Badge and Network Traffic (BNT)
tool to create animations of the events taking place in the embassy. Using the
embassy layout, time-stamps, the prox-card and web-access entries, we animated
color-based flagging of events. From the employee information table, we
obtained the office location of each employee in the embassy building layout.
Each block in the layout is associated with the corresponding employee. 
Colors of a block and the associated events
are given below: 
1. WHITE (default) -
No prox-in-building entry occurred till that point of time. 
2. GREEN -
Prox-in-building entry occurred and block colored
green from then. 
3. BLUE -
Prox-in-classified entry occurred and block remains blue till prox-out-classified
occurs. 
4. 'W' - A web-access
from the corresponding source ip, the block is highlighted with a 'W' written. 
5. RED - A web-access
occurs from corresponding source ip and the block is then blue, the block is
marked with red - indicating a suspicious event. 
Other events: 
1. If the
prox-out-classified event occurs when block color is blue, the block's color is
restored to green. 
2. When a block is
colored RED, a couple of plots displaying the ratio of reqSize to respSize of
the source ip and the dest ip are displayed for further evaluation. 
Each day's events are animated and are
available for viewing at the web-url given below. There is also an animation to
display only the suspicious events. BNT tool is developed using Python, PyX
(graphics API) and Javascript 
Developers: Shraddha Agrawal, Sravanthi Kollukuduru
Tool is available here:http://cde.iiit.ac.in/~soujanya/cgi-bin/VAST/home.cgi
Video:
 
ANSWERS:
MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent.Â
MC1.2: Characterize the patterns of behavior of suspicious computer use.
VAST Mini-challenge 1: Detailed answer 
From the task description and data set provided, we identify each record (row) in the tables of prox-card logs and web access logs as an event. An event, thus, is one of the following: (a) prox-card event: an event that indicates either an entry into the embassy by an employee, an entry into the restricted zone or an exit from the restricted zone. (b) web-access event: an event that indicates a web access from a machine in the embassy (usually referred to as source ip) to a destination machine on the net (referred to as destination ip).
Given these event types and the logical constraints from the task description (i.e. an employee makes web-accesses from his alloted machine, whenever there's an entry into the restricted area: the events are always recorded and no piggy-backing is allowed here), we formulated a few logical inconsistencies that could appear in the data. Whenever an event takes place, we check if it leads to a logical inconsistency. If it does, we flag the event as suspicious and obtain other related information to that event to validate if it is indeed suspicious.
The logical constraints being:
1. An employee is in restricted area and there's a web-access from his alloted machine.
2. An employee does not have a prox-in-building event and there's a web-access from his alloted machine. This could also mean that this employee has piggy-backed, but we still flag such an event as a suspicious candidate for further evaluation.
3. For each web-access, if the ratio (reqSize /respSize) is high - we flag such events for further evaluation.
4. The usual time-slots during which the user accesses web are plotted - an unusual access is flagged suspicious.
Observations on data 
1. There are 60 employees and 60 corresponding machines alloted to each
employee.
2. Number of unique destination ips is 20243.
3. The various ports through which data transfer took place are three in number
and they are 80, 25, 8080. 
4. All web-accesses through port 25 are to only one destionation ip '37.170.30.250'. As port 25 corresponds to simple mail transfer protocol, we conclude that this destination ip might be the embassy's mail server and these web-access requests made to this destination ip are not considered for suspicious events analysis.
5. Whenever the ratio of reqSize (request size) of the web-access to the
respSize (response size) is high, we conclude that this web-access is typically
a heavy data - transfer (upload) from the source ip to the destination ip.
GUI-assisted Data Analysis
1. Web-access events through ports 80 and 8080 are analyzed through plots.
2. The regular web-access patterns of each user on his respective machine are
analyzed through plots.
3. Animation of events:
The prox-card entries and the web-access entries being temporal, we designed the layout of the building with the aid of the embassy layout image given and enabled color-based flagging based on events associated with the employees. From the employee information table, we obtained the office location of the employee in the embassy building layout. Single blocks are associated with the corresponding employees and are colored in white as default.
The colors and the associated events are mentioned in the tool description above. The animations are made for events day-wise and the list of suspicious events are compiled to create an animation in summarized fashion. The animations for all the 31 days are available online here: http://cde.iiit.ac.in/~soujanya/cgi-bin/vast/home.cgi
Suspicious events
We identified 8 cases when a web-access is made from a source ip when the machine's corresponding employee is in the restricted area (there's a prox-in-classified entry, but not prox-out-classified entry as yet). We flag these events and observe the usual pattern of request size to response size ratio of this source ip. An unusually high value of this ratio indicates a heavy data transfer in the absence of the employee. All these 8 web-accesses are made to only one destination ip: 100.59.151.133, through port 8080.
We flagged this ip address as a possible source of information leak from the embassy and retrieved other web-accesses to this destination ip. There are 10 such web-accesses to this destination ip, though the prox-card entries are logically temporally consistent with the time-stamp of these web-accesses. So, for these web-accesses, we analyzed other related information closely: like the number of employees present in the embassy at that time, number of employees present in the classified area at that time (as shown in the table 1 below) and finally the regular web-usage patterns of each source ip from these events. From these web-usage patterns, we check if there is an unusual web-usage pattern; for instance, a rare web-access around time 17:00, when on most of the other days there is no web-activity around this time-frame or the time-stamps of the last web-access on each day is no more than 16:00 time, but only on one day there is a web-access only to this destination ip with a heavy data transfer around 18:00 to 19:00 hours time-frame.
A total of 18 web-accesses to this destination ip are found in the web log entries and all these accesses have an unusual pattern; we thus identify these 18 events as the suspicious events.
Table 1: List of suspicious events and some statistics*
For destination ip : 100.59.151.133 and port :8080
SNo Source ip Date Time Ratio Why/How
  1.    37.170.100.5                  21/31/2008             9:41        251.9230144          WARA^2.    37.170.100.41                1/17/2008               12:12      150.6416902          WARA, No prox-in entry for employee 403.    37.170.100.41                1/29/2008               16:08      116.6890521          WARA, Employee 40 is also in restricted area         4.    37.170.100.15                1/31/2008               13:10      806.6132764          WARA 5.    37.170.100.16                1/15/2008               16:14      274.6528527          WARA6.    37.170.100.16                1/10/2008               16:01      693.8860461          WARA7.    37.170.100.56                1/29/2008               15:41      339.075055            WARA, Employee 57 is also in restricted area         8.    37.170.100.31                1/10/2008               14:27      293.2205243          WARA9.    37.170.100.31                1/8/2008                 17:01      727.291                  35 employees had left after 17:00 (prox-out-classified)10.  37.170.100.31                1/15/2008               17:03      664.1519                43 employees had left after 17:00 (based on prox-out-classified)11.  37.170.100.18                1/17/2008               17:57      232.7596                38 employees had left after 17:00 (based on prox-out-classified)12.  37.170.100.13                1/22/2008               8:50        236.4215                No prox-in building, has the first access to dest ip                                                                                                                          22 entered before 8:50am (prox-in-building) 13.  37.170.100.16                1/22/2008               17:41      528.876                  40 employees left after 17:00 (prox-out-classified)               14.  37.170.100.10                1/24/2008               9:46        329.0354                41 employees had entered building before 9:46 am                      15.  37.170.100.32                1/24/2008               10:26      246.0819                -----None----           16.  37.170.100.20                1/24/2008               17:07      229.8254                39 employees left after 17:00 pm , 21 was in restricted area17.  37.170.100.20                1/29/2008               16:38      142.2871                21 was in restricted area18.  37.170.100.8                  1/31/2008               16:02      28.1967                  9 was in restricted area  ^ WARA- Web
access from source ip while corresponding employee in restricted area 
* Also, all these entries are
accompanied with the ratio plots of source ips and the web-usage patterns of
the source ips - to evaluate visually. These plots could be checked at the BNT
tool web-url mentioned above.